Mitigating Effects of Recording Condition Mismatch in Speaker Recognition Using Partial Least Squares
نویسندگان
چکیده
Speaker recognition systems have been shown to work well when recordings are collected in conditions with relatively limited mismatch. Thus, a significant focus of the current research is techniques for robust system performance when greater variability is present. This study considers a diverse data set with recordings collected in multiple different rooms with different types of microphones. A technique recently introduced to the speaker recognition community, called partial least squares (PLS), is considered for decomposing the features and mitigating the degradation in performance due to room and/or microphone mismatch. Results of this study suggest that PLS decomposition can provide substantial improvements in performance in the presence of mismatched recording conditions. The outcomes of this study provide further validation for the partial least squares decomposition and encourage further consideration of PLS for reducing session and environment variability in speaker recognition tasks.
منابع مشابه
Standoff speaker recognition: effects of recording distance mismatch on speaker recognition system performance
Speech can potentially be used to identify individuals from a distance and contribute to the growing effort to develop methods for standoff, multimodal biometric identification. However, mismatched recording distances for the enrollment and verification speech samples can potentially introduce new challenges for speaker recognition systems. This paper describes a data collection, referred to as...
متن کاملA Bayesian network approach combining pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition
The aim of this paper is to reduce the effect of mismatch in recording conditions due to the transmission channel and recording device, using conditional dependencies of prosodic and spectral envelope features. The developed system is based on a Bayesian network framework which combines statistical models of the pitch and spectral envelope features. This approach is applied to forensic automati...
متن کاملForensic Automatic Speaker Recognition Using Bayesian Interpretation and Statistical Compensation for Mismatched Conditions
Nowadays, state-of-the-art automatic speaker recognition systems show very good performance in discriminating between voices of speakers under controlled recording conditions. However, the conditions in which recordings are made in investigative activities (e.g., anonymous calls and wire-tapping) cannot be controlled and pose a challenge to automatic speaker recognition. Differences in the phon...
متن کاملKernel Partial Least Squares for Speaker Recognition
I-vectors are a concise representation of speaker characteristics. Recent advances in speaker recognition have utilized their ability to capture speaker and channel variability to develop efficient recognition engines. Inter-speaker relationships in the ivector space are non-linear. Accomplishing effective speaker recognition requires a good modeling of these non-linearities and can be cast as ...
متن کاملOn compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition.
This paper deals with a procedure to compensate for mismatched recording conditions in forensic speaker recognition, using a statistical score normalization. Bayesian interpretation of the evidence in forensic automatic speaker recognition depends on three sets of recordings in order to perform forensic casework: reference (R) and control (C) recordings of the suspect, and a potential populatio...
متن کامل